Code
#loading packages
library(DiagrammeR)Missing data occurs when there are missing values in a dataset. There are many reasons why this occurs. It can be intentional or unintentional and can be classified into the following three categories, otherwise known as missingness mechanisms (Mainzer et al. 2023):
Missing completely at random (MCAR) is the probability of missing data being completely independent of any other variables.
Missing at random (MAR) is the probability of missing data being related to the observed values.
Missing not at random (MNAR) is the probability of missing data being dependent on the missing and observed values.
Figure 1: Graphical Representation of Missingness Mechanisms (Schafer and Graham 2002)
(X are the completely observed variables. Y are the partly missing variables. Z is the component of the cause of missingness unrelated to X and Y. R is the missingness.)
Looking for patterns in the missing data can help us to determine which category they belong. These mechanisms are important in determining how to handle the missing data. MCAR would be the best case scenario but seldom occur. MAR and MNAR are more common.
The problem with ignoring any missing values is that it does not give a true representation of the dataset and can lead to bias when analyzing. This reduces the statistical power of the analysis (van_Ginkel et al. 2020). To enhance the quality of the research, the following should be followed: explicitly acknowledge missing data problems and the conditions under which they occur and employ principled methods to handle the missing data (Dong and Peng 2013).
There are three types of methods to deal with missing data, the likelihood and Bayesian method, weighting methods, or imputation methods (Cao et al. 2021). Missing data can also be handled by simply deleting.
Likelihood Bayesian method is when information from a previous predictive distribution is combined with evidence obtained in a sample to predict a value. It requires technical coding and advanced statistical knowledge.
The weighting method is a traditional approach when weights from available data are used to adjust for non-response in a survey. Inefficiency occurs when there are extreme weights or a need for many weights.
The imputation method is when an estimate from the original dataset is used to estimate the missing value. There are two types of imputation: single and multiple.
Listwise deletion is when the entire observation is removed from the dataset. Deleting missing data can lead to the loss of important information regarding your dataset and is therefore not recommended. In certain cases, when the amount of missing data is small and the type is MCAR, listwise deletion can be used. There usually won’t be bias but potentially important information may be lost.
T-tests and chi-square tests can be used to assess pairs of predictor variables to determine whether the groups’ means differ significantly. According to (van_Ginkel et al. 2020), if significant, the null hypothesis is rejected, therefore, indicating that the missing values are not randomly scattered throughout the data. This implies that the missing data is MAR or MNAR. Conversely, if nonsignificant, this implies that the data cannot be MAR. This does not eliminate the possibility that it is not MNAR–other information about the population is needed to determine this.
Whenever missing data is categorized as MAR or MNAR, listwise deletion would be wasteful, and the analysis biased. Alternate methods of dealing with the missing data is recommended: either pairwise deletion or imputation.
Pairwise deletion is when only the missing variable of an observation is removed. It allows more data to be analyzed than listwise deletion but limits the ability to make inferences of the total sample. For this reason, it is recommended to use imputation to properly deal with missing data.
Imputation is the preferred method to handle missing data. It consists of replacing missing data with an estimate obtained from the original, available data. After imputation, there will be a full dataset to analyze. To improve statistical power, the number of imputations created should be at least equal to the percent of missing data (5% equals 5 imputations, 10% equals 10 imputations, 20% equals 20 imputations, etc.) (Pedersen et al. 2017). According to (Wulff and Jeppesen 2017), 3-5 imputations are sufficient, and 10 are more than enough.
Single, or univariate, imputation is when only one estimate is used to replace the missing data. Methods of single imputation include using the mean, the last observation carried forward, and random imputation. The following is a brief explanation of each:
Using the mean to replace a missing value is a straight-forward process. The mean of the dataset is calculated, including the missing value. The mean is then multiplied by the number of observations in the study. Next, the known values are subtracted from the product, and this gives an estimate that can be used for any missing values. The problem with this method is that it reduces the variance which leads to a smaller confidence interval.
Last Observation Carried Forward (LOCF) is a technique of replacing a missing value in longitudinal studies with a previously observed value (the most recent value is carried forward) (Streiner 2008). The problem with this method is that it assumes that the previous observed value is perpetual when in reality that most likely is not the case.
Random imputation is a method of randomly drawing an observation and using that observation for any of the missing values. The problem with this method is that it introduces additional variability.
These single imputation methods are flawed. They often result in underestimation of standard errors or too small p-values (Dong and Peng 2013), which can cause bias in the analysis. Therefore, multiple imputation is the better method because it handles missing data better and provides less biased results.
Multiple, or multivariate, imputation is when various estimates are used to replace the missing data by creating multiple datasets from versions of the original dataset. It can be done by using a regression model, or a sequence of regression models, such as linear, logistic and Poison. A set of m plausible values are generated for each unobserved data point, resulting in M complete data sets (Dong and Peng 2013). The new values are randomly drawn from predictive distributions either through joint modeling (JM, which is not used much anymore) or fully conditional specification (FCS) (Wongkamthong and Akande 2023). It is then analyzed and the results are combined to obtain a single value for the missing data.
The purpose of multiple imputation is to create a pool of imputed data for analysis, but if the pooled results are lacking, then multiple imputation should not be done (Mainzer et al. 2023). Another reason not to use multiple imputation is if there are very few missing values; there may be no benefit in using it. Also worth noting is some statistical analyses software already have built-in features to deal with missing data.
Multiple imputation by chained methods, otherwise known as MICE, is the most common and preferred, method of multiple imputation (Wulff and Jeppesen 2017). It provides a more reliable way to analyze data with missing values. For this reason, this paper will focus on the methodology and application of the MICE process.
#loading packages
library(DiagrammeR)Figure 2: Flowchart of the MICE-process based on procedures proposed by Rubin (Wulff and Jeppesen 2017)
DiagrammeR::grViz("digraph {
# initiate graph
graph [layout = dot, rankdir = LR, label = 'The MICE-Process\n\n',labelloc = t, fontcolor = DarkSlateBlue, fontsize = 45]
# global node settings
node [shape = rectangle, style = filled, fillcolor = AliceBlue, fontcolor = DarkSlateBlue, fontsize = 35]
bgcolor = none
# label nodes
incomplete [label = 'Incomplete data set']
imputed1 [label = 'Imputed \n data set 1']
estimates1 [label = 'Estimates from \n analysis 1']
rubin [label = 'Rubin rules', shape = diamond]
combined [label = 'Combined results']
imputed2 [label = 'Imputed \n data set 2']
estimates2 [label = 'Estimates from \n analysis 2']
imputedm [label = 'Imputed \n data set m']
estimatesm [label = 'Estimates from \n anaalysis m']
# edge definitions with the node IDs
incomplete -> imputed1 [arrowhead = vee, color = DarkSlateBlue]
imputed1 -> estimates1 [arrowhead = vee, color = DarkSlateBlue]
estimates1 -> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputed2 [arrowhead = vee, color = DarkSlateBlue]
imputed2 -> estimates2 [arrowhead = vee, color = DarkSlateBlue]
estimates2-> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputedm [arrowhead = vee, color = DarkSlateBlue]
imputedm -> estimatesm [arrowhead = vee, color = DarkSlateBlue]
estimatesm -> rubin [arrowhead = vee, color = DarkSlateBlue]
rubin -> combined [arrowhead = vee, color = DarkSlateBlue]
}")*Rubin’s Rules: Average the estimates across m estimates. Calculate the standard errors and variance of m estimates. Combine using an adjustment term (1+1/m).
There are other methods of imputation worth noting and are briefly descrbied below.
Regression Imputation is based on a linear regression model. Missing data is randomly drawn from a conditional distribution when variables are continuous and from a logistic regression model when they are categorical (van_Ginkel et al. 2020).
Predictive Mean Matching is also based on a linear regression model. The approach is the same as regression imputation when it comes to categorical missing values but different for continuous variables. Instead of random draws from a conditional distribution, missing values are based on predicted values of the outcome variable (van_Ginkel et al. 2020).
Hot Deck (HD) imputation is when a missing value is replaced by an observed response of a similar unit, also known as the donor. It can be either random or deterministic (based on a metric or value) (Thongsri and Samart 2022). It does not rely on model fitting.
Stochastic Regression (SR) Imputation is an extension of regression imputation. The process is the same but a residual term from the normal distribution of the regression of the predictor outcome is added to the imputed value (Thongsri and Samart 2022). This maintains the variability of the data.
Random Forest (RF) Imputation is based on machine learning algorithms. Missing values are first replaced with the mean or mode of that particular variable and then the dataset is split into a training set and a prediction set (Thongsri and Samart 2022). The missing values are then replaced with predictions from these sets. This type of imputation can be used on continuous or categorical variables with complex interactions.
Multiple Imputation by Chained Equations (MICE)
In multiple imputation, m imputed values are created for each of the missing data and result in M complete datasets. For each of the M datasets, an estimate of \(\theta\) is acquired.
Combined estimator of \(\theta\) is given by:
\({\hat{\theta}}_{M}\)=\(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M} {\hat{\theta}}_{m}\)
The proposed variance estimator of \({\hat{\theta}}_{M}\) is given by:
\({\hat{\Phi}}_{M}\) = \({\overline{\phi}}_{M}\)+(1+\(\displaystyle \frac{1}{M}\))B\(_{M}\)
where \({\overline{\phi}}_{M}\) = \(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M}\)\({\hat{\phi}}_m\)
and B\(_{M}\) = \(\displaystyle \frac{1}{M-1}\)\(\sum_{m = 1}^{M}\)(\({\hat{\theta}}_{m}\)-\({\overline{\theta}}_{M}\))\(^{2}\)
The chained equation process has the following steps (Azur et al. 2011):
Using simple imputation, replace the missing data with this value, referred to as the “place holder”.
The “place holder” values for one variable are set back to missing.
The observed values from this variable (dependent variable) are regressed on the other variables (independent variables) in the model, using the same assumptions when performing linear, logistic, or Poison regression.
The missing values are replaced with predictions “m” from this newly created model.
Repeat Steps 2-4 for each variable that have missing values until all missing values have been replaced.
Repeat Steps 2-4, updating imputations each cycle for as many “m” cycles/imputations that are required.
# load data
credit = read.csv("credit_data.csv")
# load libraries
library(gtsummary)
library(dplyr, warn.conflicts=FALSE)
library(mice, warn.conflicts=FALSE)Credit score data
The credit.csv file is from the website of Dr. Lluís A. Belanche Muñoz, by way of a github repository of Dr. Gaston Sanchez. It contains data of 4,454 subjects and stores a combination of continuous, categorical and count values for 15 variables. Of the 15 variables, the “Status” variable contains binomial categorical values of “good” and “bad” to describe the kind of credit score each subject has. One data point is missing an outcome and was removed from the original data.
| Variable | Type | Description |
|---|---|---|
| X | Integer | Count variable indicating the number of subjects. |
| Status | Character | 2-level categorical variable indicating the status of the subject’s credit: good or bad. |
| Seniority | Integer | Count variable indicating the seniority a subject has accumulated over the course of their life. |
| Home | Character | 6-level categorical variable indicating the subject’s relationship to their residential address: rent, owner, parents, priv, other, or ignore. |
| Time | Integer | Count variable showing how many months has elapsed since the subject’s payment deadline without paying their debt full. |
| Age | Integer | Count variable indicating subject’s age (in years). |
| Marital | Character | 5-level categorical variable indicating the subject’s marital status: single, married, separated, divorced, or widow. |
| Records | Character | 2-level categorical variable indicating whether the subject has a credit history record: yes or no. |
| Job | Character | 4-level categorical variable indicating the type of job the subject has: fixed, freelance, partime, or others. |
| Expenses | Integer | Count variable indicating the amount of expenses (in USD) a subject has. |
| Income | Integer | Count variable indicating the amount of income (in thousands of USD) a subject earns annually. |
| Assets | Integer | Count variable indicating the amount of assets (in USD) a subject has. |
| Debt | Integer | Count variable indicating the amount of debt (in USD) a subject has. |
| Amount | Integer | Count variable indicating the amount of money (in USD) remaining in a subject’s bank account. |
| Price | Integer | Count variable indicating the amount of money a subject earns by the end of the month. |
credit %>%
tbl_summary(by = Status,
missing_text = "NA") %>%
add_p() %>%
add_n() %>%
add_overall %>%
modify_header(label ~ "**Variable**") %>%
modify_caption("**Summary of Credit Data**") %>%
bold_labels()| Variable | N | Overall, N = 4,4541 | bad, N = 1,2541 | good, N = 3,2001 | p-value2 |
|---|---|---|---|---|---|
| X | 4,454 | 2,228 (1,114, 3,341) | 2,222 (1,142, 3,366) | 2,232 (1,098, 3,326) | 0.3 |
| Seniority | 4,454 | 5 (2, 12) | 2 (1, 6) | 7 (2, 14) | <0.001 |
| Home | 4,448 | <0.001 | |||
| ignore | 20 (0.4%) | 9 (0.7%) | 11 (0.3%) | ||
| other | 319 (7.2%) | 146 (12%) | 173 (5.4%) | ||
| owner | 2,107 (47%) | 390 (31%) | 1,717 (54%) | ||
| parents | 783 (18%) | 233 (19%) | 550 (17%) | ||
| priv | 246 (5.5%) | 84 (6.7%) | 162 (5.1%) | ||
| rent | 973 (22%) | 388 (31%) | 585 (18%) | ||
| NA | 6 | 4 | 2 | ||
| Time | 4,454 | 48 (36, 60) | 48 (36, 60) | 48 (36, 60) | <0.001 |
| Age | 4,454 | 36 (28, 45) | 34 (27, 42) | 36 (28, 46) | <0.001 |
| Marital | 4,453 | <0.001 | |||
| divorced | 38 (0.9%) | 14 (1.1%) | 24 (0.8%) | ||
| married | 3,241 (73%) | 829 (66%) | 2,412 (75%) | ||
| separated | 130 (2.9%) | 64 (5.1%) | 66 (2.1%) | ||
| single | 977 (22%) | 328 (26%) | 649 (20%) | ||
| widow | 67 (1.5%) | 19 (1.5%) | 48 (1.5%) | ||
| NA | 1 | 0 | 1 | ||
| Records | 4,454 | 773 (17%) | 429 (34%) | 344 (11%) | <0.001 |
| Job | 4,452 | <0.001 | |||
| fixed | 2,805 (63%) | 580 (46%) | 2,225 (70%) | ||
| freelance | 1,024 (23%) | 333 (27%) | 691 (22%) | ||
| others | 171 (3.8%) | 68 (5.4%) | 103 (3.2%) | ||
| partime | 452 (10%) | 271 (22%) | 181 (5.7%) | ||
| NA | 2 | 2 | 0 | ||
| Expenses | 4,454 | 51 (35, 72) | 49 (35, 75) | 52 (35, 68) | 0.8 |
| Income | 4,073 | 125 (90, 170) | 100 (74, 148) | 130 (100, 178) | <0.001 |
| NA | 381 | 217 | 164 | ||
| Assets | 4,407 | 3,000 (0, 6,000) | 0 (0, 4,000) | 4,000 (0, 7,000) | <0.001 |
| NA | 47 | 20 | 27 | ||
| Debt | 4,436 | 0 (0, 0) | 0 (0, 0) | 0 (0, 0) | 0.3 |
| NA | 18 | 13 | 5 | ||
| Amount | 4,454 | 1,000 (700, 1,300) | 1,100 (800, 1,415) | 1,000 (700, 1,250) | <0.001 |
| Price | 4,454 | 1,400 (1,117, 1,692) | 1,423 (1,062, 1,728) | 1,400 (1,134, 1,678) | >0.9 |
| 1 Median (IQR); n (%) | |||||
| 2 Wilcoxon rank sum test; Pearson's Chi-squared test | |||||
First, we evaluate the dataset for missing values. As indicated in the table, the data does contain NA/missing values. We can create a table that shows each variable and how many missing values they have:
# Shows which variables have missing values and how many
colSums(is.na(credit)) X Status Seniority Home Time Age Marital Records
0 0 0 6 0 0 1 0
Job Expenses Income Assets Debt Amount Price
2 0 381 47 18 0 0
We now must analyze the data to see how we intend to handle the missing values. In order to do this, we need to create a new dataset, called new_credit, that deletes the missing data. We want to perserve the original dataset so we can implement the method we intend to use to address the missing values. We can then generate a count of rows to determine how many values were deleted in total.
# Creates a new dataset excluding missing values
new_credit = na.omit(credit)
# Number of rows of new dataset
nrow(new_credit)[1] 4039
We started out with 4,454 rows and our new dataset has 4,039. 415 rows were deleted due to the missing data. To run regression, we would be throwing away 9.3% of our data, because of missingness. Instead, we can use multiple imputation to impute the missing values so that we don’t have to discard such valuable information.
Using the MICE (Multivariate Imputation by Chained Equations) package in R, a statistical programming software, we will create multiple datasets with imputed values for the missing values. Because our dataset contains just under 10% of missing data, we will generate 10 imputations, or 10 new datasets. The MICE package seamlessly does this by creating plausable values from other columns and places them into the intersections of rows and columns with missing data.
First step is to check the missingness by looking for patterns in the original dataset using the md.pattern() function:
credit <- credit[-c(1)]
md.pattern(credit, rotate.names = TRUE) Status Seniority Time Age Records Expenses Amount Price Marital Job Home
4039 1 1 1 1 1 1 1 1 1 1 1
366 1 1 1 1 1 1 1 1 1 1 1
22 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 0
2 1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 0 1 1
0 0 0 0 0 0 0 0 1 2 6
Debt Assets Income
4039 1 1 1 0
366 1 1 0 1
22 1 0 1 1
7 1 0 0 2
8 0 0 1 2
4 0 0 0 3
3 0 0 1 3
2 0 0 0 4
1 1 1 0 2
1 0 0 0 5
1 1 1 1 1
18 47 381 455
Blue is observed values and red is missing values. There are 11 patterns.
In order to perform multiple imputation on categorical data, all string variables must be converted to factors using the as.factor() function (van_Buuren 2011):
credit$Status = as.factor(credit$Status)
credit$Home = as.factor(credit$Home)
credit$Marital = as.factor(credit$Marital)
credit$Records = as.factor(credit$Records)
credit$Job = as.factor(credit$Job)Using the mice() function, 10 multiple imputations for the missing values will be generated. The default is 5, so you must set m = to the number of imputations that you desire. Since the data type of the variables in the dataset are of both numerical and categorical nature (with 2 and more levels), the defaultMethod argument will contain pmm: predictive mean matching (numeric data); logreg: logistic regression imputation (binary data, factor with 2 levels); polyreg: polytomous regression imputation for unordered categorical data (factor > 2 levels); polr: proportional odds model for (ordered, > 2 levels). The set.seed will be given the value 1337 (any number can be used here) to retrieve the same results each time the multiple imputation is performed.
Multiple_Imputation = mice(data = credit, maxit = 10, m = 10, defaultMethod = c("pmm", "logreg", "polyreg", "polr"), set.seed = 1337)
iter imp variable
1 1 Home Marital Job Income Assets Debt
1 2 Home Marital Job Income Assets Debt
1 3 Home Marital Job Income Assets Debt
1 4 Home Marital Job Income Assets Debt
1 5 Home Marital Job Income Assets Debt
1 6 Home Marital Job Income Assets Debt
1 7 Home Marital Job Income Assets Debt
1 8 Home Marital Job Income Assets Debt
1 9 Home Marital Job Income Assets Debt
1 10 Home Marital Job Income Assets Debt
2 1 Home Marital Job Income Assets Debt
2 2 Home Marital Job Income Assets Debt
2 3 Home Marital Job Income Assets Debt
2 4 Home Marital Job Income Assets Debt
2 5 Home Marital Job Income Assets Debt
2 6 Home Marital Job Income Assets Debt
2 7 Home Marital Job Income Assets Debt
2 8 Home Marital Job Income Assets Debt
2 9 Home Marital Job Income Assets Debt
2 10 Home Marital Job Income Assets Debt
3 1 Home Marital Job Income Assets Debt
3 2 Home Marital Job Income Assets Debt
3 3 Home Marital Job Income Assets Debt
3 4 Home Marital Job Income Assets Debt
3 5 Home Marital Job Income Assets Debt
3 6 Home Marital Job Income Assets Debt
3 7 Home Marital Job Income Assets Debt
3 8 Home Marital Job Income Assets Debt
3 9 Home Marital Job Income Assets Debt
3 10 Home Marital Job Income Assets Debt
4 1 Home Marital Job Income Assets Debt
4 2 Home Marital Job Income Assets Debt
4 3 Home Marital Job Income Assets Debt
4 4 Home Marital Job Income Assets Debt
4 5 Home Marital Job Income Assets Debt
4 6 Home Marital Job Income Assets Debt
4 7 Home Marital Job Income Assets Debt
4 8 Home Marital Job Income Assets Debt
4 9 Home Marital Job Income Assets Debt
4 10 Home Marital Job Income Assets Debt
5 1 Home Marital Job Income Assets Debt
5 2 Home Marital Job Income Assets Debt
5 3 Home Marital Job Income Assets Debt
5 4 Home Marital Job Income Assets Debt
5 5 Home Marital Job Income Assets Debt
5 6 Home Marital Job Income Assets Debt
5 7 Home Marital Job Income Assets Debt
5 8 Home Marital Job Income Assets Debt
5 9 Home Marital Job Income Assets Debt
5 10 Home Marital Job Income Assets Debt
6 1 Home Marital Job Income Assets Debt
6 2 Home Marital Job Income Assets Debt
6 3 Home Marital Job Income Assets Debt
6 4 Home Marital Job Income Assets Debt
6 5 Home Marital Job Income Assets Debt
6 6 Home Marital Job Income Assets Debt
6 7 Home Marital Job Income Assets Debt
6 8 Home Marital Job Income Assets Debt
6 9 Home Marital Job Income Assets Debt
6 10 Home Marital Job Income Assets Debt
7 1 Home Marital Job Income Assets Debt
7 2 Home Marital Job Income Assets Debt
7 3 Home Marital Job Income Assets Debt
7 4 Home Marital Job Income Assets Debt
7 5 Home Marital Job Income Assets Debt
7 6 Home Marital Job Income Assets Debt
7 7 Home Marital Job Income Assets Debt
7 8 Home Marital Job Income Assets Debt
7 9 Home Marital Job Income Assets Debt
7 10 Home Marital Job Income Assets Debt
8 1 Home Marital Job Income Assets Debt
8 2 Home Marital Job Income Assets Debt
8 3 Home Marital Job Income Assets Debt
8 4 Home Marital Job Income Assets Debt
8 5 Home Marital Job Income Assets Debt
8 6 Home Marital Job Income Assets Debt
8 7 Home Marital Job Income Assets Debt
8 8 Home Marital Job Income Assets Debt
8 9 Home Marital Job Income Assets Debt
8 10 Home Marital Job Income Assets Debt
9 1 Home Marital Job Income Assets Debt
9 2 Home Marital Job Income Assets Debt
9 3 Home Marital Job Income Assets Debt
9 4 Home Marital Job Income Assets Debt
9 5 Home Marital Job Income Assets Debt
9 6 Home Marital Job Income Assets Debt
9 7 Home Marital Job Income Assets Debt
9 8 Home Marital Job Income Assets Debt
9 9 Home Marital Job Income Assets Debt
9 10 Home Marital Job Income Assets Debt
10 1 Home Marital Job Income Assets Debt
10 2 Home Marital Job Income Assets Debt
10 3 Home Marital Job Income Assets Debt
10 4 Home Marital Job Income Assets Debt
10 5 Home Marital Job Income Assets Debt
10 6 Home Marital Job Income Assets Debt
10 7 Home Marital Job Income Assets Debt
10 8 Home Marital Job Income Assets Debt
10 9 Home Marital Job Income Assets Debt
10 10 Home Marital Job Income Assets Debt
The following R code will show the imputed values. Columns are imputations, rows are observations.
head(Multiple_Imputation$imp, 10)$Status
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Seniority
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Home
1 2 3 4 5 6 7 8 9
30 parents owner owner rent parents rent parents parents owner
240 owner parents owner rent owner owner owner other parents
1060 owner owner parents rent other parents parents rent parents
1677 owner rent owner other owner other owner priv rent
2389 other parents owner parents priv other owner parents priv
2996 owner owner owner other other other rent rent owner
10
30 other
240 owner
1060 parents
1677 owner
2389 priv
2996 owner
$Time
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Age
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Marital
1 2 3 4 5 6 7 8 9 10
3319 married married single married widow married single single widow married
$Records
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Job
1 2 3 4 5 6 7 8 9
30 freelance partime partime fixed freelance fixed fixed freelance partime
912 partime partime partime partime fixed partime fixed fixed partime
10
30 freelance
912 fixed
$Expenses
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Income
1 2 3 4 5 6 7 8 9 10
30 100 160 125 73 69 103 96 165 116 93
114 80 53 70 145 105 120 92 92 150 109
144 300 100 185 250 90 250 315 254 230 373
153 130 152 40 224 142 130 76 70 60 102
158 150 195 100 98 90 140 160 122 265 176
177 250 245 250 241 142 254 241 230 250 321
195 120 180 98 162 186 78 72 102 150 144
206 115 200 203 108 197 70 190 177 100 70
241 142 85 212 243 109 208 130 152 214 155
242 150 144 115 120 538 212 91 150 126 169
278 60 130 115 100 79 118 78 145 96 170
318 80 150 59 57 107 90 120 142 200 80
330 156 125 159 128 198 120 215 100 120 24
333 176 211 150 145 149 160 113 152 162 114
335 168 113 132 109 180 318 99 51 75 130
356 100 176 105 113 170 188 100 135 143 156
360 113 80 95 164 160 140 69 83 124 80
394 500 500 350 500 905 491 150 500 350 500
404 130 150 251 107 121 96 70 280 71 182
422 71 174 92 225 154 330 150 135 58 174
439 115 103 160 136 60 88 98 168 189 145
444 275 210 136 153 80 78 242 63 80 155
462 156 210 158 120 244 84 103 100 92 102
469 93 167 95 104 92 127 120 55 139 86
479 84 50 84 169 89 140 93 75 178 115
481 200 148 205 207 165 95 154 340 220 140
483 90 124 134 120 143 120 198 80 175 145
485 80 225 85 191 254 98 82 118 104 129
496 66 93 90 58 125 50 73 70 80 221
498 126 312 101 86 80 102 105 80 177 60
505 86 73 140 109 60 40 56 120 112 135
567 114 79 110 42 95 205 200 50 133 114
572 73 80 92 42 150 148 90 218 150 48
582 63 33 40 67 121 85 108 75 46 33
648 430 500 373 230 292 183 50 191 250 171
653 86 108 120 130 120 200 140 88 98 74
667 230 183 230 183 800 200 183 416 905 91
675 400 300 208 466 125 87 69 208 185 300
678 315 56 145 105 178 139 126 155 200 190
699 157 85 65 120 135 160 189 170 190 235
708 100 285 125 50 189 200 81 160 194 100
714 70 91 80 163 45 125 8 115 81 92
716 202 135 139 83 70 120 92 75 42 45
733 125 110 40 85 300 150 221 147 118 243
734 165 100 70 112 95 130 80 118 88 160
746 400 129 121 100 87 175 81 100 175 123
777 80 106 120 175 139 80 57 115 112 67
781 46 160 84 122 103 85 87 70 69 115
785 111 200 138 101 100 466 250 126 200 313
804 118 135 191 71 117 247 140 161 93 75
824 92 123 117 41 95 127 178 67 50 79
865 114 60 125 60 71 114 175 81 50 110
866 296 136 195 200 115 101 140 60 130 115
880 132 100 150 138 109 189 95 167 212 115
889 145 359 204 90 100 113 192 50 120 154
906 92 380 428 275 207 535 190 120 99 200
912 92 102 78 74 99 131 108 80 74 285
942 90 135 120 65 90 115 53 104 63 60
952 170 120 125 128 105 188 130 145 139 250
989 70 80 70 180 140 160 350 117 120 80
1001 82 110 63 71 40 105 115 50 85 96
1017 125 130 130 130 184 80 192 80 120 113
1039 128 190 53 61 185 350 250 128 90 100
1044 99 70 50 60 103 35 83 57 90 47
1069 138 250 173 225 70 100 183 63 110 240
1100 102 86 92 335 60 47 30 107 53 77
1111 65 25 100 175 111 80 111 72 78 45
1125 123 187 86 240 42 120 100 180 105 111
1168 246 147 155 606 218 201 107 207 235 500
1208 225 384 200 69 120 158 160 464 275 58
1226 220 180 51 188 335 170 300 382 150 219
1250 230 128 142 200 128 142 65 130 85 175
1257 197 185 45 217 120 298 214 240 260 150
1276 57 142 166 80 200 177 123 196 114 200
1281 95 74 70 117 105 110 300 92 57 70
1289 90 192 90 149 88 73 33 93 90 38
1297 120 198 189 77 127 200 146 218 116 115
1307 400 86 350 200 200 107 80 115 130 110
1314 120 84 164 161 400 70 100 131 340 64
1335 240 85 67 92 206 119 100 256 100 184
1364 275 229 606 146 150 500 144 289 50 120
1365 250 300 166 411 200 67 145 154 555 250
1366 466 140 152 205 125 154 250 200 250 95
1392 150 150 500 500 905 491 150 150 491 337
1421 200 153 210 350 243 125 161 106 131 54
1427 110 193 164 122 130 110 134 419 208 130
1433 90 44 124 113 53 50 70 95 60 60
1436 120 122 130 111 218 178 120 179 115 364
1437 107 81 176 94 113 110 135 80 194 140
1441 90 65 187 100 450 74 92 140 73 123
1456 93 92 140 140 176 204 90 90 227 75
1473 275 229 118 70 166 155 187 384 154 143
1509 128 110 135 190 142 137 160 42 159 195
1513 65 81 125 127 70 84 60 66 60 76
1530 56 140 130 161 135 82 129 80 90 60
1535 160 128 131 160 169 198 50 115 59 126
1536 180 120 250 250 350 300 231 288 100 220
1544 100 125 156 90 56 63 230 88 95 80
1549 90 70 125 142 126 106 180 100 175 77
1564 50 208 202 135 85 172 90 116 180 100
1580 114 110 130 67 103 113 74 45 111 60
1583 156 125 147 200 67 28 90 198 75 60
1598 39 251 250 146 200 55 150 135 200 211
1599 113 140 145 150 107 350 81 133 35 134
1619 85 70 218 100 90 143 349 126 176 55
1629 136 103 197 106 81 152 103 180 145 137
1648 58 82 140 106 115 52 75 45 90 61
1662 175 148 80 99 65 100 88 190 174 137
1677 179 225 155 42 200 113 200 103 87 257
1685 250 60 87 110 260 400 79 210 241 142
1722 185 211 72 153 216 426 87 250 135 146
1724 87 70 150 59 300 175 145 52 104 100
1733 174 92 100 200 289 113 247 110 394 100
1741 60 78 137 55 93 63 67 106 80 70
1745 150 100 120 149 147 80 198 150 242 190
1753 101 48 58 150 90 73 140 45 30 70
1762 148 117 85 91 140 132 27 63 48 198
1766 126 300 300 92 150 160 217 200 145 79
1771 148 60 120 500 214 413 255 100 105 35
1798 150 135 97 72 135 150 180 91 100 78
1802 150 491 500 500 200 491 500 150 500 500
1803 170 100 200 92 140 97 130 156 232 150
1807 114 75 110 192 28 155 80 126 121 225
1811 89 99 113 110 157 42 71 80 119 72
1844 86 80 138 191 251 31 121 100 234 199
1851 250 250 151 260 430 250 166 151 430 250
1852 194 70 283 120 127 107 83 72 100 198
1870 214 97 89 74 75 95 167 77 180 152
1872 57 77 70 116 350 138 200 88 79 220
1882 70 77 92 136 76 139 52 90 42 50
1883 200 160 83 216 125 193 203 88 179 145
1893 500 350 200 491 183 241 241 905 150 500
1898 185 80 75 118 70 91 159 101 235 81
1903 86 120 165 65 91 219 72 55 120 65
1907 160 205 83 300 236 96 382 394 148 195
1920 48 100 80 92 116 120 94 130 25 63
1936 199 85 230 179 61 355 65 157 168 301
1946 106 100 125 144 64 93 79 103 95 105
1948 169 107 88 80 142 110 128 92 70 66
1962 127 87 137 135 140 120 60 97 117 90
1963 314 75 75 220 51 148 210 260 158 140
1965 85 122 115 148 500 95 116 144 111 87
1970 250 189 151 230 260 250 250 300 250 100
1972 150 500 200 500 150 150 905 491 500 500
1977 50 102 45 248 200 90 160 170 200 245
1979 82 78 211 162 130 113 150 200 177 157
1980 88 19 90 48 63 72 60 110 120 72
1984 200 247 72 60 223 142 186 160 66 170
2006 90 120 75 150 121 140 124 160 92 112
2016 240 107 100 225 168 162 100 52 104 142
2022 137 148 115 117 107 141 110 110 121 87
2025 108 240 188 68 335 250 133 212 250 150
2042 242 150 60 205 80 200 214 98 130 63
2043 120 167 96 35 183 80 58 180 140 120
2076 110 168 180 147 88 95 100 150 180 70
2077 172 190 110 187 110 180 177 210 201 285
2083 116 50 110 133 198 95 117 245 61 140
2156 173 315 138 200 95 175 160 161 160 225
2157 176 100 140 110 184 77 51 107 98 120
2186 81 107 83 70 92 160 364 67 77 75
2197 121 100 75 67 92 56 84 60 172 170
2205 235 236 145 79 79 155 190 296 191 110
2218 200 115 55 80 104 126 160 125 91 184
2227 130 120 102 60 85 130 96 35 66 96
2233 295 146 280 170 150 70 400 154 168 102
2240 207 289 154 340 275 157 301 80 120 19
2257 142 88 140 200 117 63 199 111 132 120
2280 151 700 144 350 144 250 144 171 186 275
2291 130 382 211 41 301 250 100 135 285 107
2297 65 70 30 46 63 69 161 120 75 66
2304 88 75 90 46 46 135 51 53 49 49
2310 233 175 270 373 300 250 125 195 257 100
2323 135 50 60 56 78 92 42 60 49 128
2331 905 491 241 500 200 200 200 491 150 337
2337 192 55 60 182 75 125 72 70 87 40
2349 248 137 200 214 173 97 214 200 264 114
2365 700 150 380 260 188 105 79 131 128 320
2369 165 120 201 200 158 260 120 196 170 110
2387 148 100 92 141 130 60 65 60 45 60
2396 130 150 140 95 90 140 60 115 25 74
2399 470 96 67 115 72 86 180 240 107 125
2402 115 154 220 42 428 300 289 223 190 380
2404 160 200 126 65 165 150 150 81 96 189
2437 166 250 189 115 191 315 250 60 400 195
2445 159 100 174 128 40 137 152 96 125 87
2446 285 110 150 108 250 69 242 154 65 105
2453 148 185 21 60 125 65 83 105 122 77
2460 95 81 66 171 149 135 58 40 152 182
2467 70 65 125 52 110 140 85 100 72 119
2473 108 65 113 146 104 148 140 56 140 95
2490 78 88 102 82 23 100 119 96 85 37
2495 60 92 82 63 71 139 70 50 161 66
2505 67 157 145 135 226 164 138 217 152 190
2566 160 40 120 114 220 123 90 199 160 225
2572 125 120 225 110 100 53 110 90 78 80
2578 76 50 107 160 235 82 87 150 107 142
2584 63 130 116 74 78 131 51 45 169 129
2596 175 55 78 76 102 178 42 102 81 56
2605 102 94 146 140 75 80 71 102 130 100
2614 100 210 130 170 250 172 150 242 189 222
2624 144 80 70 130 140 79 96 144 95 74
2625 85 170 198 162 135 90 120 200 90 300
2631 200 95 90 112 165 118 130 150 50 230
2632 295 110 197 233 178 71 130 116 92 100
2651 228 100 150 144 147 57 120 55 150 62
2652 35 210 42 118 70 75 147 77 180 79
2653 91 271 230 148 214 156 52 170 150 100
2668 100 155 195 76 265 111 125 80 192 189
2676 216 80 150 115 203 114 50 210 157 173
2681 65 122 142 92 60 120 200 106 91 140
2683 105 117 90 199 80 122 140 108 155 120
2695 141 121 114 93 196 200 107 155 135 130
2696 175 50 75 92 65 86 80 110 83 27
2707 167 218 130 67 127 240 105 122 129 170
2720 120 120 190 62 107 112 190 250 25 177
2723 90 86 74 56 55 130 47 82 60 100
2725 69 50 100 65 345 93 87 114 359 82
2730 150 84 212 180 150 159 100 88 201 50
2769 110 75 120 130 166 107 98 72 76 133
2780 77 60 120 93 68 50 85 80 68 86
2781 150 173 99 245 300 535 128 229 171 179
2802 215 180 98 84 166 92 242 91 120 170
2805 242 126 161 183 164 150 155 70 70 145
2806 82 106 45 70 69 200 54 158 85 80
2807 181 120 233 199 137 130 99 120 70 100
2810 88 110 250 167 163 139 220 115 155 120
2813 173 150 133 190 100 75 150 69 71 200
2815 100 76 115 119 163 101 117 100 62 46
2825 340 103 200 70 230 178 144 125 83 117
2854 300 135 114 128 300 96 68 250 79 442
2869 200 145 175 120 165 150 210 126 92 115
2882 86 90 75 65 47 53 55 78 53 52
2884 200 51 80 130 65 220 160 100 99 62
2893 450 130 115 105 104 110 198 129 199 80
2915 85 119 180 123 75 270 211 136 140 60
2927 53 70 100 156 64 135 155 105 136 69
2935 178 170 285 127 78 86 88 70 60 140
2936 189 72 191 300 128 695 150 218 162 190
2939 120 79 64 71 82 128 173 194 113 135
2951 250 178 200 416 171 178 416 245 491 337
2954 126 125 250 800 275 50 231 959 186 250
2969 130 120 250 206 80 176 500 270 136 154
2971 155 340 72 155 160 200 182 175 172 170
2979 300 240 180 94 531 143 94 94 300 300
2983 218 133 85 125 126 95 80 217 150 125
2991 105 149 120 127 250 92 49 121 150 182
2996 112 72 214 72 128 132 114 160 150 167
2999 217 428 251 100 380 350 143 144 107 700
3008 250 531 321 144 150 191 250 220 700 700
3014 101 181 160 100 76 350 135 57 78 128
3021 250 110 97 150 180 126 119 100 150 110
3026 110 64 415 80 216 148 117 109 100 180
3031 120 25 70 55 110 83 73 80 136 209
3038 85 49 31 49 67 48 33 33 121 53
3040 298 217 150 217 120 250 165 606 150 300
3069 189 54 210 215 200 100 250 173 125 75
3080 133 28 90 80 67 67 85 100 78 140
3096 142 113 195 115 160 60 88 75 135 300
3104 47 54 95 110 115 54 60 83 72 130
3106 101 93 149 70 140 135 200 162 251 77
3110 190 400 126 245 187 236 50 100 150 128
3121 138 101 383 126 450 101 111 69 50 101
3123 150 133 60 217 54 112 140 100 162 169
3139 300 257 459 100 270 314 100 250 364 257
3167 77 125 80 145 72 170 120 60 115 60
3170 160 86 190 112 85 107 152 117 230 160
3183 80 170 240 75 250 144 180 318 100 71
3185 110 190 80 165 77 168 100 110 200 178
3187 184 70 128 135 60 202 153 185 71 69
3203 144 92 219 195 74 160 80 71 117 154
3218 102 65 39 90 124 105 124 75 80 245
3222 188 212 95 110 198 172 280 149 180 106
3229 66 137 136 150 324 125 110 159 89 107
3233 92 180 129 400 400 230 606 118 442 100
3237 124 220 189 187 128 169 150 101 100 103
3245 145 120 55 85 119 119 139 125 233 120
3252 100 62 56 63 120 122 180 93 85 115
3266 134 160 54 170 117 195 105 145 110 133
3286 140 327 78 110 118 75 75 126 110 125
3288 160 106 93 277 66 202 100 125 127 120
3304 178 241 178 500 241 200 905 200 905 337
3310 102 155 110 225 120 50 200 125 87 96
3316 156 200 67 84 101 158 105 208 84 113
3325 78 102 101 216 188 140 73 79 140 45
3336 117 153 90 176 203 132 211 226 220 277
3338 91 800 416 178 416 200 245 178 200 183
3345 202 186 500 130 101 104 298 150 147 90
3352 207 163 74 196 74 165 345 100 200 150
3365 117 130 101 56 149 83 127 139 175 120
3382 223 190 178 130 145 145 156 63 90 366
3433 85 209 50 80 91 52 80 95 140 42
3439 74 95 135 65 90 78 139 63 61 139
3451 146 40 58 115 62 130 58 60 140 76
3452 53 52 120 100 60 75 60 96 75 42
3454 160 115 87 415 170 170 165 159 75 160
3456 135 72 95 73 70 90 48 120 72 63
3461 120 135 208 300 191 154 54 90 400 117
3462 181 125 60 60 130 140 129 163 70 160
3473 120 75 107 256 90 142 100 140 125 73
3477 90 108 67 130 60 140 150 135 135 100
3478 138 25 67 65 208 74 71 140 85 265
3494 147 80 78 127 134 61 105 103 100 78
3513 160 99 127 106 100 105 75 150 122 78
3523 293 156 122 180 75 190 95 102 140 100
3525 211 124 157 115 110 105 125 165 250 315
3534 200 190 117 114 120 210 102 300 135 130
3556 78 230 201 185 333 60 170 120 200 174
3641 150 230 290 160 100 238 214 150 196 92
3645 71 144 140 95 115 111 143 197 52 125
3657 158 80 98 60 92 52 51 125 41 60
3674 150 178 158 95 169 80 72 105 120 55
3679 190 340 145 121 210 240 152 170 350 77
3691 100 129 208 69 250 120 97 263 110 100
3704 170 110 92 25 108 116 60 107 79 146
3709 130 125 234 300 213 190 121 150 251 225
3714 65 147 106 85 70 52 120 80 135 100
3717 110 100 92 93 70 127 81 75 105 92
3730 70 90 64 100 200 128 111 120 90 80
3740 176 122 200 53 120 55 175 73 184 116
3763 61 32 126 250 105 67 85 91 125 90
3768 126 175 200 112 98 176 102 350 152 129
3773 380 321 300 250 130 320 118 320 360 120
3794 124 176 179 70 206 400 250 92 125 79
3800 161 90 144 85 156 70 95 173 93 90
3823 185 76 100 150 160 128 163 96 60 114
3825 42 67 121 49 53 33 49 49 67 53
3850 95 132 175 245 134 233 152 117 250 142
3855 132 125 135 223 102 95 124 133 120 135
3857 77 40 70 35 102 80 70 68 162 162
3858 50 42 101 60 45 140 80 50 130 105
3882 61 93 92 128 68 60 240 98 80 132
3887 143 115 168 133 142 250 187 95 127 100
3892 303 110 205 190 83 90 170 150 90 89
3902 120 310 235 155 179 160 155 190 75 190
3914 171 141 180 90 142 82 85 110 88 95
3928 120 300 50 400 180 230 260 350 250 350
3932 112 100 129 111 64 260 80 91 65 62
3945 201 84 60 153 181 61 141 152 145 200
3946 380 236 65 99 199 390 85 212 169 145
3947 92 32 92 63 119 96 85 77 66 122
3951 85 170 183 98 100 140 135 80 125 147
3955 106 70 219 61 94 100 120 115 80 105
3966 118 50 117 136 102 66 410 105 143 93
3992 188 78 88 105 210 114 110 130 85 110
4003 159 250 250 112 72 139 120 110 187 139
4023 178 333 500 170 500 121 169 200 246 20
4036 85 95 100 78 64 87 169 70 82 69
4049 58 33 85 165 67 31 19 121 53 75
4064 98 96 167 50 120 100 196 250 86 107
4069 210 114 100 74 135 178 107 75 340 158
4076 85 165 72 53 88 68 45 121 33 121
4082 137 150 161 165 46 177 147 80 167 195
4085 50 214 78 159 114 212 40 200 231 182
4096 104 130 143 160 169 300 165 200 204 212
4119 200 79 103 120 70 155 161 55 80 102
4159 208 148 115 211 100 55 120 100 201 198
4168 150 127 150 118 96 100 134 90 60 70
4173 150 45 78 83 72 73 105 76 209 72
4181 145 78 107 117 78 102 125 204 74 108
4191 185 86 115 107 119 136 130 135 300 70
4198 411 250 428 250 325 360 97 87 300 250
4199 137 175 130 75 200 76 148 40 350 127
4222 105 219 92 99 195 160 67 92 92 106
4223 80 109 95 110 170 92 110 69 85 120
4237 227 100 100 200 88 90 125 120 208 60
4246 140 150 60 159 102 90 366 80 72 90
4247 60 113 200 60 120 175 65 72 113 70
4256 191 135 154 371 300 144 65 107 320 113
4281 110 65 75 175 95 80 25 92 63 73
4295 111 70 115 60 80 123 79 100 91 51
4333 192 42 55 90 202 73 67 63 105 90
4349 81 165 203 95 97 150 115 140 120 189
4368 70 109 90 67 180 163 114 75 109 92
4373 200 70 170 84 120 124 91 150 100 83
4398 110 110 144 120 90 90 169 120 274 199
4411 100 380 110 121 380 345 100 394 121 62
4420 241 500 491 150 905 500 150 491 905 905
4433 164 115 145 160 38 169 115 116 208 55
4436 136 80 128 70 98 8 185 77 149 168
4440 125 197 186 400 187 260 100 90 135 178
4441 100 155 71 289 195 535 260 100 320 159
We can check the quality of the imputations by running a strip plot, which is a single axis scatter plot. It will show the distribution of each variable per imputed data set. We want the imputations to be values that could have been observed had the data not been missing.
par(mfrow=c(7,2))
stripplot(Multiple_Imputation, Status, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Seniority, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Home, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Time, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Age, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Marital, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Records, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Job, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Expenses, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Income, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Assets, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Debt, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Amount, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Price, pch = 19, xlab = "Imputation number")Next, we will pool the results of the complete dataset with the imputed dataset to arrive at estimates that will properly account for the missing data. We fit the complete model with the with() function and display the summary of the pooled results. It will give us the estimate, standard error, test statistic, degrees of freedom, and the p-value for each variable.
# fit complete-data model
fit <- with(Multiple_Imputation, glm(Status ~ Seniority + Home + Time + Age + Marital + Records + Job + Expenses + Income + Assets + Debt + Amount + Price, family = binomial))
# pool and summarize the results
summary(pool(fit)) term estimate std.error statistic df
1 (Intercept) 9.814751e-01 7.334980e-01 1.33807480 4300.24373
2 Seniority 8.312981e-02 7.467015e-03 11.13293702 4362.76288
3 Homeother 6.234888e-02 5.733104e-01 0.10875241 4423.75455
4 Homeowner 1.150920e+00 5.595613e-01 2.05682607 4419.16607
5 Homeparents 9.428102e-01 5.677649e-01 1.66056444 4425.40067
6 Homepriv 4.243125e-01 5.770735e-01 0.73528329 4414.59126
7 Homerent 4.218033e-01 5.624968e-01 0.74987674 4426.28471
8 Time -1.902847e-04 3.482487e-03 -0.05464046 4143.03176
9 Age -1.105297e-02 4.990176e-03 -2.21494640 4252.79778
10 Maritalmarried 6.024318e-01 4.197272e-01 1.43529369 4207.32624
11 Maritalseparated -6.781777e-01 4.640929e-01 -1.46129728 4250.10047
12 Maritalsingle 1.560922e-01 4.251030e-01 0.36718673 4231.26580
13 Maritalwidow 1.612457e-01 5.303590e-01 0.30403116 4103.30382
14 Recordsyes -1.783461e+00 1.025704e-01 -17.38768069 3912.89251
15 Jobfreelance -7.635440e-01 1.027294e-01 -7.43257641 3427.58461
16 Jobothers -7.027346e-01 2.031269e-01 -3.45958441 3940.95314
17 Jobpartime -1.472557e+00 1.258036e-01 -11.70520155 4411.06706
18 Expenses -1.505503e-02 2.646683e-03 -5.68826244 2690.31713
19 Income 7.120761e-03 8.203147e-04 8.68052331 86.96945
20 Assets 2.248416e-05 6.719016e-06 3.34634685 325.31247
21 Debt -1.693532e-04 3.847255e-05 -4.40192333 174.24479
22 Amount -1.929480e-03 1.717220e-04 -11.23606588 4090.81562
23 Price 8.709653e-04 1.261683e-04 6.90320427 4339.59729
p.value
1 1.809428e-01
2 2.086587e-28
3 9.134038e-01
4 3.976153e-02
5 9.687180e-02
6 4.622060e-01
7 4.533688e-01
8 9.564275e-01
9 2.681655e-02
10 1.512778e-01
11 1.440078e-01
12 7.134981e-01
13 7.611196e-01
14 2.746080e-65
15 1.337976e-13
16 5.467464e-04
17 3.451846e-31
18 1.421792e-08
19 2.029941e-13
20 9.146867e-04
21 1.866292e-05
22 7.151376e-29
23 5.820776e-12
In conclusion, missing data can occur in research for a variety of reasons. It is never a good idea to ignore it. Doing this will lead to biased estimates of parameters, loss of information, decreased statistical power, and weak reliability of findings (Dong and Peng 2013). The best course of action is to impute the missing data by using multiple imputation. When missing data is discovered, it is important to first identify it and look for missing data patterns. Next, define the variables in the dataset that are related to the missing values that will be used for imputation. Create the necessary number of complete data sets. Run the models and combine them using the imputed values, and finally, analyze the complete dataset. Performing these steps will minimize the adverse effects caused by missing data on the anaylsis (Pampka, Hutcheson, and Williams 2016).